Enable multiple treated units in synthetic control quasi experiments #494

drbenvincent · 2025-06-28T12:16:53Z

Closes #456

Changes across quite a few files, so I'll give my PR summary and include a copilot generated summary in case it's useful.

Ben's PR summary

The main aim of this PR is to enable analysis of synthetic control experiments with multiple treated units. Previously this quasi experimental situation was kind of possible (and outlined in the Multi-cell geolift analysis notebook), though that worked by literally iterating over each treated unit and running multiple independent analyses.

This PR enables one model to be used when there are either one or multiple treated units. The construction of the model WeightedSumFitter still corresponds to an unpooled model, but I am holding off from exploring partial pooling because we have a PR about to be merged which enables user-provided priors (#488) and I'd rather do it using that approach.

My initial implementation lead to complex code because the WeightedSumFitter model (used for synthetic control) was the only one which had a 2D likelihood (dims of ["obs_ind", "treated_units"]). All the other models had a 1D likelihood (dims of ["obs_ind"]). So there was a lot of branching and dealing with special cases. So another large change introduced by this PR is that all likelihood terms are now 2D. This is why there are code changes beyond the WeightedSumFitter and SyntheticControl classes.

We've also got additional (vibe-coded, but manually examined) tests to cover both the single and multi treated unit cases for synthetic control.

I've also re-built the UML diagrams and relevant notebooks for the docs. The main changes are in the Multi-cell geolift analysis notebook, which is updated to reflect the new functionality.

Copilot generated PR summary

This pull request introduces changes across multiple causal inference experiment modules to improve compatibility with multi-dimensional data structures and ensure proper handling of single treated units. Key updates include modifying data array dimensions, adding new coordinates, and refining plotting methods to handle single-unit data.

Changes to Data Array Structures:

causalpy/experiments/diff_in_diff.py: Updated self.y and COORDS to include a new dimension treated_units for better handling of multi-dimensional data.
causalpy/experiments/interrupted_time_series.py: Modified self.pre_y and self.post_y to retain 2D shapes and added treated_units as a coordinate. Adjusted COORDS for PyMCModel compatibility. [1] [2]
causalpy/experiments/prepostnegd.py: Updated self.y and COORDS to include treated_units for improved data structure handling.
causalpy/experiments/regression_discontinuity.py and causalpy/experiments/regression_kink.py: Added treated_units dimension and coordinate to self.y and updated COORDS. [1] [2]

Adjustments to Plotting Methods:

causalpy/experiments/diff_in_diff.py: Refined plotting methods (_plot_causal_impact_arrow) to use .isel(treated_units=0) for single-unit selection in posterior predictive data. [1] [2] [3] [4]
causalpy/experiments/interrupted_time_series.py: Enhanced _bayesian_plot to handle single treated units in posterior predictive and impact data. [1] [2] [3]
causalpy/experiments/prepostnegd.py: Adjusted _bayesian_plot to use .isel(treated_units=0) for posterior predictive data. [1] [2]
causalpy/experiments/regression_discontinuity.py and causalpy/experiments/regression_kink.py: Updated _bayesian_plot to reflect single treated unit handling in posterior predictive data. [1] [2]

Compatibility with Single-Unit Data:

causalpy/experiments/interrupted_time_series.py: Added logic to handle single-unit data format for SKL models and adjusted impact calculations accordingly.
causalpy/experiments/synthetic_control.py: Changed dimensions and coordinates in self.datapre_control for consistency with other modules.

📚 Documentation preview 📚: https://causalpy--494.org.readthedocs.build/en/494/

codecov · 2025-06-28T12:24:05Z

Codecov Report

Attention: Patch coverage is 98.50374% with 6 lines in your changes missing coverage. Please review.

Project coverage is 95.13%. Comparing base (fdce5b0) to head (192327d).

Files with missing lines	Patch %	Lines
causalpy/tests/test_pymc_models.py	98.20%	3 Missing ⚠️
causalpy/pymc_models.py	95.74%	2 Missing ⚠️
causalpy/experiments/synthetic_control.py	97.50%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #494      +/-   ##
==========================================
+ Coverage   94.59%   95.13%   +0.54%     
==========================================
  Files          28       28              
  Lines        2053     2384     +331     
==========================================
+ Hits         1942     2268     +326     
- Misses        111      116       +5

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

review-notebook-app · 2025-06-28T19:20:50Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

…to multicell geolift notebook

drbenvincent · 2025-06-29T13:04:27Z

Note to self

Relatively happy with where this is at now.

I should do more manual inspection of the tests (which were vibe coded).

There is definitely scope to remove some conditional branching if we set the likelihood of all models to be 1 dimensions. So it would change from (n_obs,) and turn into (n_obs,1). But I think I'll leave that for another PR potentially.

Copilot

Pull Request Overview

This PR enables handling multiple treated units throughout the synthetic control workflow by updating documentation, extending the PyMC models, and adding comprehensive multi-unit tests.

Added sphinx-togglebutton for interactive docs
Extended PyMCModel and WeightedSumFitter to accept and process multiple treated units
Updated SyntheticControl class and added end-to-end tests for multi-unit scenarios

Reviewed Changes

Copilot reviewed 6 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
pyproject.toml	Added `sphinx-togglebutton` to docs dependencies
docs/source/conf.py	Registered `sphinx_togglebutton` extension
causalpy/tests/test_pymc_models.py	Added fixtures and tests for multi-unit `WeightedSumFitter`
causalpy/tests/test_integration_pymc_examples.py	Added fixtures and integration tests for multi-unit `SyntheticControl`
causalpy/pymc_models.py	Updated `_data_setter`, `predict`, `score`, and coefficient printing to support multi-unit
causalpy/experiments/synthetic_control.py	Renamed dims (`control_units` → `coeffs`), parameterized plots and data getters for treated units

causalpy/pymc_models.py

The likelihood of all models is now 2 dimensional. This means we don't have to do conditional branching for single vs multiple treatment units. So we've been able to remove a lot of the code in PyMCModel. This has touched a number of experiment classes which are not related to synthetic control.

drbenvincent · 2025-06-30T23:17:53Z

That last commit was quite a big one. Model likelihoods are now 2-dimensional. This means we can avoid a lot of conditional branching based on single vs multiple treated unit situations.

initial commit - utilising vibe coding

59c5b6e

update multi cell geolift notebook with new functionality/API

26691ec

drbenvincent added 19 commits June 28, 2025 20:21

add test that the r2 scores differ across treated units

4fa1650

simplify plot code

7b473af

revert to simpler plot titles

be01357

rename primary_unit_name -> treated_unit + revert to no "Unit" title

47e0a2d

revert change in causal impact shaded region colour

ee8a92b

code simplifications by always having a treated_units dimension

b79743f

code simplification relating to _get_score_title

aa9920a

another code simplification - related to scoring

ebddbb5

code simplification in _ols_plot

eac1ef3

code simplification related to PyMCModel._data_setter

1a5f9bd

add sphinx-togglebutton for a collapsible admonition + other updates …

e67a28c

…to multicell geolift notebook

update uml diagrams

d0fc0d3

PyMCModel.score always to get xr.DataArray arguments

b6f5ca8

simplification to PyMC.print_coefficients

8badc05

simplification of WeightedSumFitter.build_model

4a78a50

clean up PyMCModel.predict + PyMCModel._data_setter

f1849b1

remove a numerical index in favour of a named dimension

a55f97d

make code comment more specific

6341e53

consolidate tests, fix doctest

2befccb

drbenvincent requested a review from Copilot June 29, 2025 13:07

Copilot AI reviewed Jun 29, 2025

View reviewed changes

causalpy/pymc_models.py Outdated Show resolved Hide resolved

causalpy/pymc_models.py Outdated Show resolved Hide resolved

causalpy/pymc_models.py Outdated Show resolved Hide resolved

drbenvincent added 4 commits June 29, 2025 15:28

Merge branch 'main' into multi-cell-geolift

78be544

set fixture scope to module

37e8de7

towards a more unified scoring (r2) approach

d0c520f

more unification with score (r2) in terms of unified naming: unit_{n}_r2

3ee430e

drbenvincent added 2 commits June 30, 2025 21:48

refactor PyMCModel.score

adf04f9

drbenvincent added 4 commits July 1, 2025 00:35

tweak docstring

2fba338

Merge branch 'main' into multi-cell-geolift

4e63dfa

update class diagram

04ace83

re-run relevant notebooks

192327d

drbenvincent added enhancement New feature or request geo project Related to geo-testing labels Jul 5, 2025

drbenvincent marked this pull request as ready for review July 5, 2025 10:03

drbenvincent requested review from NathanielF and juanitorduz July 5, 2025 10:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable multiple treated units in synthetic control quasi experiments #494

Enable multiple treated units in synthetic control quasi experiments #494

drbenvincent commented Jun 28, 2025 •

edited

Loading

Uh oh!

codecov bot commented Jun 28, 2025 •

edited

Loading

Uh oh!

review-notebook-app bot commented Jun 28, 2025

Uh oh!

drbenvincent commented Jun 29, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

drbenvincent commented Jun 30, 2025

Uh oh!

Uh oh!

Enable multiple treated units in synthetic control quasi experiments #494

Are you sure you want to change the base?

Enable multiple treated units in synthetic control quasi experiments #494

Conversation

drbenvincent commented Jun 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Ben's PR summary

Copilot generated PR summary

Changes to Data Array Structures:

Adjustments to Plotting Methods:

Compatibility with Single-Unit Data:

Uh oh!

codecov bot commented Jun 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

review-notebook-app bot commented Jun 28, 2025

Uh oh!

drbenvincent commented Jun 29, 2025

Note to self

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

drbenvincent commented Jun 30, 2025

Uh oh!

Uh oh!

drbenvincent commented Jun 28, 2025 •

edited

Loading

codecov bot commented Jun 28, 2025 •

edited

Loading